CHSRegExp - a PowerPlant object that provides simple regular expression pattern matching capability.
When CHSRegExp encounters any type of error, it throws a C string error. A test application is included to give an example of use of CHSRegExp, and shows error catching usage.
This code "should" compile under any ANSI C++ compiler... however, I have only tested it with Metrowerks CodeWarrior.
CHSRegExp provides an object that is initialized with a search pattern. Any number of CHSRegExp objects may be in use at any one time (subject to memory constraints :-). Calling regexec with any arbitrarily long C string will search that string for the regular expression of that object. Sub-expressions (up to 9) may be matched by enclosing the sub expression by opening and closing parens, '(' and ')'.
A regular expression is zero or more branches, separated by '|'. It matches anything that matches one of the branches.
A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.
A piece is an atom possibly followed by '*', '+', or '?'. An atom followed by '*' matches a sequence of 0 or more matches of the atom. An atom followed by '+' matches a sequence of 1 or more matches of the atom. An atom followed by '?' matches a match of the atom, or the null string.
An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), '.' (matching any single character), '^' (matching the null string at the beginning of the input string), '$' (matching the null string at the end of the input string), a '\' followed by a single character (matching that character), or a single character with no other significance (matching that character).
A range is a sequence of characters enclosed in '[]'. It normally matches any single character from the sequence. If the sequence begins with '^', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by '-', this is shorthand for the full list of ASCII characters between them (e.g. '[0-9]' matches any decimal digit). To include a literal ']' in the sequence, make it the first character (following a possible '^'). To include a literal '-', make it the first or last character.
None known :-). Send any problem reports/praise/suggestions to: tom@itsnet.com
CHSRegExp is an encapsulation of the regular expression code written by Henry Spencer (the HS in CHSRegExp), at the University of Toronto. Parts of this readme file are from the man page for that code.
I have no clue where I got the code I have... I've had it at *least* 4 years. If anyone recognizes it could they send me the man page? The information in here is from the original 1985 version of the man page.
/* * Copyright (c) 1986 by University of Toronto. * Written by Henry Spencer. Not derived from licensed software. * * Permission is granted to anyone to use this software for any * purpose on any computer system, and to redistribute it freely, * subject to the following restrictions: * * 1. The author is not responsible for the consequences of use of * this software, no matter how awful, even if they arise * from defects in it. * * 2. The origin of this software must not be misrepresented, either * by explicit claim or by omission. * * 3. Altered versions must be plainly marked as such, and must not * be misrepresented as being the original software. *** THIS IS AN ALTERED VERSION. It was altered by John Gilmore, *** hoptoad!gnu, on 27 Dec 1986, to add \n as an alternative to | *** to assist in implementing egrep. *** THIS IS AN ALTERED VERSION. It was altered by John Gilmore, *** hoptoad!gnu, on 27 Dec 1986, to add \< and \> for word-matching *** as in BSD grep and ex. *** THIS IS AN ALTERED VERSION. It was altered by John Gilmore, *** hoptoad!gnu, on 28 Dec 1986, to optimize characters quoted with \. *** THIS IS AN ALTERED VERSION. It was altered by James A. Woods, *** ames!jaw, on 19 June 1987, to quash a regcomp() redundancy. *** THIS IS AN ALTERED VERSION. It was altered by Thomas R. Kimpton, *** tom@dtint.dtint.com, on 21Oct92, to use Macintosh handles, and *** explicit types, i.e. longs or shorts, no ints. * *** THIS IS AN ALTERED VERSION. It was altered by Thomas R. Kimpton, *** tom@itsnet.com, on 29Aug96. This is now a PowerPlant object (I'm *** hoping it's also a C++ object :-). All the static 'helper' variables *** are now private member variables. And, because I wanted to see if I *** could make this code 'portable', I decided to use malloc/free instead *** of Macintosh Handles, and C strings. * * Beware that some of this code is subtly aware of the way operator * precedence is structured in regular expressions. Serious changes in * regular-expression syntax might require a total rethink. */
CHSRegExp is supplied as is. Thomas R. Kimpton hereby disclaims all warranties relating to this software, whether express or implied, including without limitation any implied warranties of merchantability or fitness for a particular purpose. Thomas R. Kimpton will not be liable for any special, incidental, consequential, indirect or similar damages due to loss of data or any other reason, even if Thomas R. Kimpton, or an agent of Thomas R. Kimpton has been advised of the possibility of such damages. In no event shall Thomas R. Kimpton's liability for any damages ever exceed the price paid for the license to use the software, regardless of the form of the claim. The person using the software bears all risk as to the quality and performance of the software.
phew!